A support vector machine for identification of single-nucleotide polymorphisms from next-generation sequencing data
نویسندگان
چکیده
MOTIVATION Accurate determination of single-nucleotide polymorphisms (SNPs) from next-generation sequencing data is a significant challenge facing bioinformatics researchers. Most current methods use mechanistic models that assume nucleotides aligning to a given reference position are sampled from a binomial distribution. While such methods are sensitive, they are often unable to discriminate errors resulting from misaligned reads, sequencing errors or platform artifacts from true variants. RESULTS To enable more accurate SNP calling, we developed an algorithm that uses a trained support vector machine (SVM) to determine variants from .BAM or .SAM formatted alignments of sequence reads. Our SVM-based implementation determines SNPs with significantly greater sensitivity and specificity than alternative platforms, including the UnifiedGenotyper included with the Genome Analysis Toolkit, samtools and FreeBayes. In addition, the quality scores produced by our implementation more accurately reflect the likelihood that a variant is real when compared with those produced by the Genome Analysis Toolkit. While results depend on the model used, the implementation includes tools to easily build new models and refine existing models with additional training data. AVAILABILITY Source code and executables are available from github.com/brendanofallon/SNPSVM/
منابع مشابه
A Novel Support Vector Machine-Based Approach for Rare Variant Detection
Advances in next-generation sequencing technologies have enabled the identification of multiple rare single nucleotide polymorphisms involved in diseases or traits. Several strategies for identifying rare variants that contribute to disease susceptibility have recently been proposed. An important feature of many of these statistical methods is the pooling or collapsing of multiple rare single n...
متن کاملDetection of some Tree Species from Terrestrial Laser Scanner Point Cloud Data Using Support-vector Machine and Nearest Neighborhood Algorithms
acquisition field reference data using conventional methods due to limited and time-consuming data from a single tree in recent years, to generate reference data for forest studies using terrestrial laser scanner data, aerial laser scanner data, radar and Optics has become commonplace, and complete, accurate 3D data from a single tree or reference trees can be recorded. The detection and identi...
متن کاملGrapevine acidity: SVM tool development and NGS data analyses
English) Single Nucleotide Polymorphisms (SNPs) represent the most abundant type of genetic variation and they are a valuable tool for several biological applications like linkage mapping, integration of genetic and physical maps, population genetics as well as evolutionary and protein structure-function studies. SNP genotyping by mapping DNA reads produced via Next generation sequencing (NGS) ...
متن کاملInsight into Neutral and Disease-Associated Human Genetic Variants through Interpretable Predictors
A variety of methods that predict human nonsynonymous single nucleotide polymorphisms (SNPs) to be neutral or disease-associated have been developed over the last decade. These methods are used for pinpointing disease-associated variants in the many variants obtained with next-generation sequencing technologies. The high performances of current sequence-based predictors indicate that sequence d...
متن کاملApplication of Artificial Neural Networks and Support Vector Machines for carbonate pores size estimation from 3D seismic data
This paper proposes a method for the prediction of pore size values in hydrocarbon reservoirs using 3D seismic data. To this end, an actual carbonate oil field in the south-western part ofIranwas selected. Taking real geological conditions into account, different models of reservoir were constructed for a range of viable pore size values. Seismic surveying was performed next on these models. F...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Bioinformatics
دوره 29 11 شماره
صفحات -
تاریخ انتشار 2013